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Abstract — The basic polarization phenomenon for i.i.d. sources 
is extended to a framework allowing dependencies within and 
between multiple sources. In particular, it is shown that taking 
the polar transform of a random matrix with i.i.d. columns 
of arbitrary (correlated) distribution allows to extract the ran- 
domness and dependencies. This result is the used to develop 
polar coding schemes (having low complexity) for: (1) distributed 
data compression, i.e., Slepian-Wolf coding (without decomposing 
the problem into single-user problems), (2) compression of 
sources with memory, (3) compression of sources on finite fields, 
extending the polarization phenomenon for alphabets of prime 
cardinality to powers of primes. 

1. Introduction 

A new technique called 'polarization' has recently been 
introduced in |3| to develop efficient channel coding schemes. 
The codes resulting from this technique, called polar codes, 
have several nice attributes: (1) they are linear codes generated 
by a low-complexity deterministic matrix (2) they can be 
analyzed mathematically and bounds on the error probability 
(exponential in the square root of the block length) can be 
proved (3) they have a low encoding and decoding complexity 
(4) they allow to reach the Shannon capacity on any discrete 
memoryless channels (DMC). These codes are indeed the first 
codes with low decoding complexity that are provably capacity 
achieving on any DMC. 

The key result in the development of polar code is the 
so-called 'polarization phenomenon', initially shown in the 
channel setting in [S] . The same phenomenon admits a source 
setting formulation, as follows. 

Theorem 1. [I[3]l, Bl] Let X = [Xi,...,X„] be i.i.d. 
Bemoulli(p), n be a power of 2, and Y = XGn, where 

Gn 



2(«) 



1 



-\{je[n]:H{Yj\Y^ 
n 



. Then, for any e G (0,1), 

-1) >l_e}| "=^i?b), 



(1) 



where H[p) is the entropy of a BeniouUi(p) distribution. 

Note that ([T]i implies that the proportion of components j for 
which iJ([/j |[/J"i) e (e, 1 -e) tends to 0. Hence most of the 
randomness has been extracted in about nH{p) components 
having conditional entropy close to 1 and indexed by 



Re{p) = [i e [n] : H{Y,\Y^-^) >l-e} 



(2) 



This theorem is extended in H to X = [Xi , . . . , X,-^ 
being i.i.d. from an arbitrary distribution /i on F^, where q 
is a prime, replacing H{p) by H{p) (and using the logarithm 
in base q). It is however mentioned that the theorem may 
fail when q is not a prime but a power of a prime, with 
a counter-example provided for g = 4. In Section IIII-BI of 
this paper, we show a generalized version of the polarization 
phenomenon, i.e., of Theorem [T] for powers of primes (we 
show it explicitly for powers of 2, but the same holds for 
arbitrary primes). Also, the formulation of Theorem [l] is 
slightly more general in ||4], it includes an auxiliary random 
variable Y (side-information), which is a random variable 
correlated with X but not intended to be compressed, and 
which is introduced in the conditioning of each entropy term. 
Although this formulation is mathematically close to Theorem 
[U it is more suitable for an application to the Slepian-Wolf 
coding problem (distributed data compression), by reducing 
the problem to single-user source coding problems. A direct 
approach for this problem using polar codes is left open for 
future work in JU; we investigate this here in Section BlI-AI 
Finally, we also generalize Theorem [T| to a setting allowing 
dependencies within the source (non i.i.d. setting.) 

This paper provides a unified treatment of the three prob- 
lems mentioned above, namely, the compression of multiple 
correlated sources, non i.i.d. sources and non binary sources. 
The main result of this paper is Theorem |2] where a "matrix 
polarization" shows how not only randomness but also depen- 
dencies can be extracted using Gn- Some results presented in 
this paper can be viewed as counter-parts of the results in (|2l 
for a source rather than channel setting. Reciprocally, some 
results presented here in the source setting can be extended 
to a channel setting (such as channels with memory, or non- 
prime input alphabets). Finally, connections with extractors 
in computer science and the matrix completion problem in 
machine learning are discussed in Sections |IV] and |V] 



Some notations 

. [?i] = {l,2,...,n} 
. For X e F§ and S C [k] 



[xi-.i^ S] 



and besides o{n) fluctuating components, the remaining n(l 
H{p)) components have conditional entropy below e. 



x[S] 

For x e F§, = [xi, . . . ,Xi] 

{0, 1, . . . , m} ± e = [-£, e] U [1 - £, 1 + e] U • • ■ U [m - 

H{X\Y) ^ j:yiJ:.Px\Y{x\y)logi/pxiY{^\y))PY{y) 

For a matrix A, the matrix A®'' is obtained by taking k 
Kronecker products of A with itself. 



II. Results 

Definition 1. A random variable Z over is e-uniform if 
ii{Z) > k{l - e), and it is e-deterministic if H{Z) < ek. We 
also say that Z is e-deterministic given W if H{Z\W) < ek. 

Tlieorem 2. (1) Let n be a power of 2 and X be an m x n 

random matrix with i.i.d. columns of arbitrary distribution fi 
on F^. Let Y = XG„ where = [i o]®i°S2(»)_ j,^^^^ 
any e > 0, there exist two disjoint subsets of indices R^, C 
[m] X [n] with |[to] x [n]\(i?£Ul?e)| = o{n) such that the subset 
of entries Y\U^] is e-uniform and Y[D^] is e-deterministic 
given Y[Df^. (Hence \R^\ = nH{p), \D^\ = n{m - H{n)).) 

(2) Moreover, the computation of Y as well as the recon- 
struction of X from the non-deterministic entries of Y can 
be done in 0{n\ogn), with an error probability of 0(2-^ ), 
/3 < 1/2, using the algorithm polar-matrix-dec. 

Remarks. 

• The multiplication XGn is over F2 

• The sets Re, depend on the distribution ji (and on the 
dimensions m and n), but not on the realization of Y . 
These sets can be accurately computed in linear time (cf. 
Section 0. 

• To achieve an error probability of 0(2~" ), one picks 
e = e„ = 2-"°, for a< 1/2. 

The following lemma provides a characterization of the de- 
pendencies in the columns of Y, it is proved in Section IVI-AI 
Recall that Yj denotes the j-th column of Y, Yj{i) the 
entry of Y, Y,- [S] = [Yj{i) : i e S] and Y' = [Yi,..., Yj]. 

Lemma 1. For any e > 0, we have, 

-\{je[n]:H{Y,[S]\Y^-')e{0,l,...,\S\}±e,ySC[m]}\ 
n 

1 

This lemma implies the first part of Theorem |2] as shown 
in next section. The second part of the theorem is proved in 
Section lVI-BI together with the following result, which further 
characterizes the dependency structure of Y. 

Lemma 2. For any e > and j € [n], let Aj denote the 
binary matrix of maximal rank such that 

H{A,Y,\Y-J-')<e. 

Note that Aj can have zero rank, i.e., Aj can be a matrix 
filled with zeros. We then have, 

1 " 

-^millity(A,)^iJ(/i). 

Moreover, the result still holds when £ = e„ = 2^" , for 
a < 1/2. 

Note that, if H{AjYj\Y^~''-) < e, A^Yj is e-deterministic 
given Y^^^, and if Aj has rank Vj, by freezing kj = m — Vj 
components in Yj appropriately, say on Bj, we have that AjYj 
can be reduced to a full rank matrix multiplication Ajyj[Bj], 
and hence yj[Bj] is e-deterministic given Y^~^ and Yj[Bj]. 



Hence the number of bits to freeze, is exactly J2j ^j^ 
stated in the lemma, this corresponds to the total entropy of 
Y (up to a o(n)). 

A. Proof of Theorem\2\(part 1) and how to set R^ and 

Let e > and Let £"„ = En{e) be the set of indices i e 
[n] for which H{Yj[S]\Y3-^) e {0, 1, ... , \S\} ± e, for any 
S C [m]. From Lemma [T] n — \En\ = o{n). Note that for 
i G En, there exists a minimal set (not necessarily unique) Tj 
such that 

H{Y,[T,]\Y^-^) > H{Y,\Y^~^) - e (3) 
which also implies 

H{Y,[T,]\Y^-')>\Tj\-e, (4) 
and, by the chain rule and defining Sj := Tj, 

H{Y,[S,]\Y^-%[S;:])<e. (5) 

(Note that if H{Yj\Y^~^) < e, we define T, so that 
Sj — [m].) We then have 

H{U,eE„Y,[SM^jeE„Y,[S,]r) 
< J2 H{Y,[S,]\Y^-'Y,[S;;]) < en 

3€E„ 

and Uj^E„Yj[Sj] is e-deterministic given {\Jj^Er,Yj[Sj]y, so 
that De = Uj^Er^Sj. Moreover, we have 

H{Y)>H{Li,eE„Y,[T,]) > ^ H{Y,[T,]\Y^-^]) 

jeE,, 

> ^ H{Y,\Y^-')-en 

n 

>^H{Yj\Y^-^) -en-o{n) 

= H{Y)-en-o{n), (6) 
where the third inequality uses (|3]l, and from (|4), 

J2 \T,\ > HiU,eE„Y,[T,]) > J2 \Tj\^en. 
jeE„ jeE„ 

Since H{Y) = H{X) = nH{fi), we have 

nH{fi) +en> ^ | > nH{fi) - en - o{n) 

3<^E„ 

and Ujg_B„y, [Tj] is ^^^^^.^^ -uniform, so that Re/(H{t,)-2e) = 

^jeE„Tj. 

B. Decoding algorithm 

Definition 2. polar-matrix-dec 
Inputs: D'- C [jti] x [n], ylD"] e f!/^'''. 
Output: y e F^"". 
Algorithm: 

0. Let M = D; 

1. Find the smallest j such that Sj = £ is not 
empty; compute 

y[S,] = arg max P{y[5,] = u\Y'^~' = y'-\Y[S'j] = y[S'j]y, 



2. Update M = M \ {j}, y[M] = y[M] U y[Sj]; 

3. If M is empty output y, otherwise go back to 1. 

Note that, using (O for the definition of Sj (and the 
corresponding D^), the realizations of Y^^^ and yj[5'|] are 
known, and with high probability one guesses Yj [Sj] correctly 
in step 1, because of (|5]). Moreover, due to the Kronecker 
structure of G„, and similarly to t3j, step 1. and the entire 
algorithm require only O(nlogn) computations. Finally, from 
the proof of Theorem |2]part (2), it results that step 1. can also 
be performed slightly differently, by finding sequentially the 
inputs Y[j] for j S Sj, reducing an optimization over all 
possible y , where \Sj\ can be as large as m, to only 

TO optimizations over F2 (which may be useful for large m). 

III. Three Applications 
We present now three direct applications of Theorem |2] 

• Distributed data compression, i.e., Slepian-Wolf coding 

• Compression of sources on arbitrary finite fields 

• Compression of non i.i.d. sources 

A. Source polarization for correlated sources: Slepian-Wolf 
coding 

In in, the two-user Slepian-Wolf coding problem is ap- 
proached via polar codes by reducing the problem to single- 
user source coding problems. A direct approach is left open 
for future work; we investigated this here, for arbitrary many 
users. 

Consider to binary sources which are correlated with an 
arbitrary distribution p. We are interested in compressing an 
i.i.d. output of these sources. That is, let Xi, . . . , Xn be i.i.d. 
under p on F™, i.e., Xi is an m dimensional binary random 
vector and, for example, Xi[i], . . . , Xn [i] is the sources output 
for user i. If we are encoding these sources together, a rate 
H{p) is sufficient (and it is the lowest achievable rate). In Q, 
Slepian and Wolf showed that, even if the encoders are not able 
to cooperate, lossless compression can still be achieved at rate 
H{p). We now present how to use Theorem |2] to achieve this 
rate with a polar coding scheme. 

Polar codes for distributed data compression: 

1. For a given n and e (which sets the error probability), 
since each user knows the joint distribution p, each user can 
compute the "chart" of the deterministic indices, i.e., the set 
-De C [to] X [n] and identify its own chart D^{i,-). 

2. Each user computes Y{i,-) = X{i,-)Gn and stores 
Y{i, ■)[Dg{i, -Y], so that the joint decoder is in possession 
of Y[D^], and can run polar-dec-matrix with Y[D^] as 
input to get Y, with an error probability at most en. Since G„ 
is invertible, indeed = G„, one can then find X = YGn- 

From Theorem |2] we have the following result. 

Corollary 1. [Distributed polar compression] For m cor- 
related sources of joint distribution fi, previously described 
scheme allows to perform lossless and distributed compression 
of the sources at sum-rate H{p), with an error probability 
of 0(2~" ), /3 < 1/2, and an encoding and decoding 
complexity of 0(n log n). 



Note that this result allows to achieve the sum-rate of the 
Slepian-Wolf region, i.e., a rate belonging to the dominant 
face of the Slepian-Wolf achievable rate region, it does not say 
that any rate in that region can be reached with the proposed 
scheme. 



B. Polarization for arbitrary finite fields 

In m, the source polarization result is stated for sources that 
are i.i.d. and q-ary, where q is prime. It is also mentioned that if 
q is not prime, the theorem may fail. In particular, an example 
for (7 = 4 is provided where the conclusion of Theorem [T]does 
not hold. It is also mentioned that if additional randomness 
is introduced in the construction of the polar transformation 
(leading no longer to a deterministic matrix G„), the result 
holds for arbitrary powers of primes. We show here that a 
generalized polarization phenomenon still holds for arbitrary 
powers of primes (we formally show it for powers of 2 only 
but any prime would work) even for the deterministic polar 
transform G„. 

Corollary 2. [Polarization for finite fields] Let X = 
[Xi,...,X„] be i.i.d. under p on ¥q where q ~ 2™, and 
let Y = XGn (computed over Vq). Then, although Y may 
no polarize over F2™, it polarizes over F™ in the sense of 
Theorem^ more precisely: Define by V a F™ representation 
q/F2m, p the distribution on F™ induced by /i on ¥2^, and 
set Y := ViY) (organized as an m y, n matrix). Then the 
conclusions of Theorem^hold for Y. 

Note: this theorem still holds when g is a power of any 
prime, by combining it with the result in ID for prime 
alphabets. The case where q = 2™ is particularly interesting 
for complexity considerations (cf. Section |V)- 

Interpretation of Corollary |2} When q is a prime, 
H{Yj\Y^~^) S {Ojlogq} ± e, which means that Yj is either 
roughly uniform and independent of the past or roughly a 
deterministic function of the past. However, for q being a 
power of 2 (or a power of a prime), we only get that 
H{Yj\Y^^^) G {0, 1, . . . , log g} ± e, and previous conclusion 
cannot be drawn, stressing indeed a different polarization 
phenomenon. However, Corollary |2] says that if we work with 
the vector representation of the elements in F^, we still have 
a 'polarization phenomenon' in the sense of Theorem |2] i.e^ 
for almost all j G [n], a subset of the components of the Yj 
are either roughly uniform and independent or deterministic 
functions of the past and the complementary components. 

Compression of 2™-flry i.i.d. sources: For a given X = 
[Xi, . . . ,Xn], compute Y = XGn and transform Y into Y 
based on the representation of F2™ by F™. Organize Y to be 
an TO X n matrix. Note one can equivalently map X into X 
and then take G„ to get Y. This is due to the fact that the ¥2"^ 
addition corresponds to the pointwise addition in F™. Finally, 
store Y on D^CpY, and run polar-matrix-dec to recover 
Y, hence Y and X. 



C. Source polarization for non i.i.d. sources 

Let a binary source consist of i.i.d. blocks of length m, 
each block having an arbitrary distribution /i. We can then 
compress the source as follows. From n blocks Xi, . . . ,X„ 
each of length m, i.e., mn outputs of the source, create the 
matrix X = [X\ \ . . . |X*J and apply the polar transform to get 
Y = XGn- Then store the components of Y which belong 
to D^{fiY. To reconstruct X, reconstruct Y from Y[Di:{^y] 
using polar-matrix-dec and find X = YGn- 

If the source is not exactly block i.i.d. but is mixing, i.e., 
if lim„_>oo IP{-'^n = x\Xo = xq} = P{X„ = x}, for any xq, 
we can open windows of length o(ri^) between the blocks and 
store without compression these o{n^) inter-block bits, which 
does not increase the compression rate. We are then left with a 
source formed by blocks which are 'almost' i.i.d. and a similar 
procedure can be used. 

From Theorem 12] we have the following. 

Corollary 3. For a binary source consisting of i.i.d. blocks of 
length m, each block having distribution fi, the polar coding 
scheme described previously allows to compress losslessly the 
source at rate H{i.i), with an error probability of 0{2~" ), 
j3 < 1/2, and an encoding and decoding complexity of 
0{n log n). 

As discussed previously, a similar result holds for source 
which are mixing. 

IV. Extractors in computer science 

We have discussed in this paper a procedure to extract 
randomness, i.e., uniform bits, from non uniform bits. The 
applications we considered are in compression and coding, but 
there are also numerous applications of randomness extraction 
problems in computer science. In particular, there is a notion 
of "extractor" in theoretical computer science, which aims 
at extracting uniform bits from sources having much more 
general assumptions than the one considered here. 

Phrased in our terminology, an extractor is roughly a map 
that extracts m bits that are e-uniform from n bits that have 
a total entropy at least fc, with the help of a seed of d 
uniform bits. For more details and a survey on extractors see 
for example ||9l, ||6l. The notion of e-uniform, or e-close to 
uniform, used in computer science is usually measured by 
the Zi-norm, rather than the entropy as used in this paper 
Nevertheless, these two notions can be related and this is 
a minor distinction. Also, the entropy used in the computer 
science literature is the min-entropy rather than the Shannon- 
entropy, which is a stronger assumption, since the Shannon- 
entropy is an upper bound to the min-entropy. On the other 
hand, the source for the extractor definition is only assumed 
to have min-entropy fc, and no further assumptions are made 
on the distribution of Xi, . . . , X„, whereas in our setting, we 
consider sources that are at least ergodic and with a known 
distribution. One should also stress that we did not make use 



of any seed in our problem^ 

In order to establish a more concrete connection between 
polar coding and formal extractors, we present here a result 
which takes into account one of the two caveat just mentioned: 
we only assume that the source has entropy at least fc, without 
requiring the exact knowledge of the distribution, but we keep 
an i.i.d. setting. Using Section IIII-CI one can generalize this 
result to a setting where the source is mixing, but in even 
then we do not make use of any seed. In particular, if one 
could use a seed, ideally of small size, e.g., O(logn), to turn 
an arbitrary source of lower-bounded entropy, into a mixing 
source of comparable entropy, one could use the following 
result to construct real extractors (work in progress). 

Definition 3. Let (fc,e)-Pcxt : ^ F^" be the matrix 
obtained by deleting the columns of Gn that are not in 
R^2 /2n{p{k)), where p{k) is one of the two binary distribution 
having entropy H{p{k)) = k/n (and Re{-) as defined in (|2]i). 

Note that Pext benefits from the low encoding complexity 
of Gn, namely O(nlogn). 

Lenuna 3. Let n be a power of two and X = [Xi , . . . , Xn] 
be i.i.d. Bernoulli such that H{Xi) > k (where H denotes 
the Shannon or min-entropy). For any e G (0, 1), Pcxt(X) is 
e-uniform (in the 11 or entropy sense) and 

m ^ k + 0(71). 

This result is proved in Section IVI-CI and using Section 
Illl-Cl it can be extended to a setting where the source is mixing. 
Note that even in a mixing setting, the source entropy is ft{n), 
which is indeed a regime where good extractors are known 

V. Discussion 

We have treated in this paper three problems, namely, 
compression of correlated sources, sources with memory and 
sources on finite fields, with a unified approach using a 
matrix polarization (Theorem|2), and we provided polar coding 
schemes for each of these problems. The advantage of using 
polar coding schemes is that these schemes have low encoding 
and decoding complexity, and achieve the optimal performance 
(Shannon limit) meanwhile affording mathematical guarantees 
on the performance, as described in Corollaries [T] and |3] 

One can now also combine these different problems. 
Namely, for multiple sources that are define on some fi- 
nite fields, with some well-behaved correlations between and 
within themselves, one can, using the interleaving trick and the 
vector representation described respectively in Sections IIII-CI 
and IIII-BI organize the sources outputs in a matrix form so as 
to meet the hypotheses of Theorem |2l and hence have a polar 
compression scheme requiring the minimal compression rate. 
One can also translate the results in this paper to a channel 
setting, such as m-user multiple access channels (already 

'Note that, as opposed to the compression problem, when only concerned 
with randomness extraction, the treatment of the deterministic bits and 
reconstruction algorithm may not matter. 



treated in ||2l), channels with memory or channels with non 
binary fields inputs, by using duality arguments. 

Although the results in this paper are expected to hold when 
TO = o(n), one has to be careful with the complexity scaling 
when TO gets large. In that regard, an advantage of using finite 
fields of cardinality q = 2™ rather than modular fields of 
prime cardinality, is that some operations required in the polar 
decoding algorithm are convolution-like operations over the 
underlying field, and as the FFT algorithm allows to reduce the 
computational cost of a convolution from 0{q^) to 0{q logj q) 
when 5 is a power of 2, one can benefit from this fact. 

We have assumed in this paper that the sets and 
ReifJ') can be computed, without discussing how. The first 
reason why we do not stress this aspect here, as in other 
papers in polar coding, is that these sets do not depend on 
the realization of the source(s). Namely, if one is able to 
compute these sets once for several values of interest of e 
and of the dimensions, one can then use the same sets for 
any outputs realizations. This is fundamentally different than 
the decoding algorithm which takes the source realization as 
an input. Yet, it is still crucial to be able to compute these 
sets once, for the parameters of interests. In order to do so, 
there are at least two possible approaches. The first one is 
via simulations, and is discussed in |3|: using the Kronecker 
structure of G,i, it is possible to run simulations and get 
accurate estimate of the conditional entropies H(Yj\Y^^^), in 
particular (from Section Ill-At of the sets D^{ii) and Rdn)- 
Another option is to use algorithms to approach the exact 
values of H{Yj \Y^^^) within a given precision, in linear time; 
this has been proposed in particular in ||8]. It would also 
be interesting to have mathematical characterizations of these 
sets. At the moment, this is an open problem, even for the 
simplest settings (single binary i.i.d. source, or in the channel 
setting, the binary erasure channel). 

Finally, this work could also apply to the matrix completion 
setting. For example, if X is an to x n matrix where column 
Xj contains the ratings of to movies by user j, we can use 
Theorem |2] to show that by applying the matrix^ G„ x I(DcY 
to X, we are left with fewer entries (the more correlations 
between the movie ratings the fewer entries) that yet allow to 
recover the initial matrix. Hence, if we are given only a smaller 
set of appropriate entries (and which sets can be characterized 
using Section ITl-AI ). we can reconstruct the initial matrix using 
polar-matrix-dec. 

VI. Proofs 

A. Proof of Lemma Q] 

In order to prove Lemma [T] we need the following definition 
and lemmas. 

Definition 4. For a random vector V distributed over F™, 
define V- ^V + V and V+ = V , where V is an i.i.d. 
copy of V . Let {fei}i>i be i.i.d. binary random variables in 

^the matrix obtained by deleting the columns of Gn that are not in 



{ — , +} with uniform probability distribution, and let 

for S C [m], where the order between (— , +)-sequences is 
the lexicographic order (with — < +). 

Note that 

where X is the matrix whose columns are i.i.d copies of 
V . The following lemma justifies the definition of previous 
random processes. 

Lemma 4. Using V ^ fi in the definition of rjk [S], we have 
for any n and any set D C [0, l^l] 

e [n] : H{Y,[S]\Y^-') E D} = P{7?iog,(„) [5] G D}. 

The proof is a direct consequence from the fact that the 
&fc's are i.i.d. uniform. Using the invertibility of and 
properties of the conditional entropy, we have the following. 

Lemma 5. rjk [S] is a super-martingale with respect to bk for 
any S C [to] and a martingale for S — [to]. 

Proof: For n = 2, we have 

2H{X,[S]) = H(X,[S]X2[S]]) 
^H{Y,[S]Y2[S]) 
= H{Yi[S]) + H{Y2[S]\Yi[S]) 
>H{Y,[S]) + H{Y^[S]\Y,) (7) 

with equality in the Q if S* = [to]. For n > 2, the same 
expansion holds including in the conditioning the appropriate 
"past" random variables. ■ 
Note that because i]k[S] is a martingale for S = [to], the 
sum-rate is conserved through the polarization process. 

Now, using previous lemma and the fact that rik[S] G [0, [S*!] 
for any S, the martingale convergence theorem implies the 
following. 

Corollary 4. For any S C [to], rjk [S] converges almost surely. 

The following allows to characterize possible values of the 
process ?/fc[5'] when it converges. 

Lemma 6. For any e > Q, X valued in F™, Z arbitrary, 
{X',Z') an i.i.d. copy of {X,Z), S C [to], there exists 
S = (5(e) such that H{X'[S]\Z') - H{X'[S]\Z, Z' , X[S] + 
X'[S]) < 5 implies H{X'[S]\Z') - H{X'[S \ i]\Z') G 
{0, 1} ± e for any i G S. 

Proof: We have 

H{X'[S]\Z')-H{X'[S]\Z, Z',X[S]+X'[S]) 
= I(X'[S];X[S] + X'[S]\Z,Z') 

>i(x'[s];x[q + x'[q\z,z') 

> I{X'[i];X[i] + X'[i\\Z, Z', X'[S \ i]) 

= H{X'[i]\Z', X'[S\i])- HiX[i] + X'[i]\Z, Z',X'[S\i]). 

(8) 



It is shown in Q that if Ai , A2 are binary random variables 
and -61,-62 are arbitrary such that PA1A2B1B2 (11,02, ^i, ^2) = 
jQ{bi\ai + a2)Q(&2|a2), for some conditional probability Q, 
then, for any a > 0, there exists b > such that H{A2\B2) — 
H{A2\BiB2Ai) < b implies H{A2\B2) e {0,l}±a. Using 
this result, we can pick 5 small enough to lower bound ^ 
and show that H{X'[i]\Z' , X'[S \ i]) G {0, 1} ± e. From the 
chain rule, we conclude that H[X'[S \ i\\Z') e {0, 1} ± e. ■ 
We then get the following using Corollary |4] and Lemma |6] 

Corollary 5. With probability one, lim^^^oo '7fc[<5'] G 
{0,1,..., 

Finally, Lemma |4] and Corollary |5] imply Lemma [T] 

B. Proof of Lemma |2] and Theorem \2\part (2) 

In order to prove Theorem |2] part (2), we basically need 
to show that part (1) still holds when taking e scaling like 
£„ = 2~" for a < 1/2, as in |i5j. We did not find a direct 
way to show that when 7]k [S] converges to | S*], it must do it that 
fast (the sub-martingale characterization is too week to apply 
results of Q directly). This is why we looked into Lemma |2] 
By developing a correspondence between previous results and 
analogue results dealing with linear forms of the X[S']'s, we 
are able to use the speed convergence results shown for the 
single-user setting and conclude. This approach was developed 
in 121 for the multiple access channel, below is the counter-part 
for our source setting. 

Lemma 7. For a random vector Y valued in F™, and an 
arbitrary random vector Z, if 

H{Y[S]\Z)e{0,l,...,\S\}±e 

for any S C [to], we have 

i/(^rW|z)e{o,i}±<5(e), 

with S{e) ^4° 0. 

This lemma is proved in IT]. Using this result, we have that 
for j £ En, there exists a matrix Aj of rank rj = \Sj\, such 
that 

H{AjYj\Y^-^) < mS{e). 

This implies the first part of Lemma |2] and we now show how 
we can use this other characterization of the dependencies in 
Y to conclude a speed convergence result. We first need the 
following "single-user" result. 

Lemma 8. For any /3 < 1/2 and e„ = 2^""^, we have, 

e [n] : e„ < HiJ^YMY'-^) < £,V5 C [m]}\ ^ 0. 

Proof: We define the auxiliary family of random pro- 
cesses Cfe['S'], for S C [to], by 

a[5] =Z(^y^-^''[^]|l^^--^V(cl...Cfe) < 
ieS 



where, for a binary uniform random variable A and an 
arbitrary random variable B, Z{A\B) = 2Eb(P{^ = 
0|-B}P{A = l|-B})i/2 (ijg Bhattacharyya parameter. Note 
that 

Z{A\B) > H{A\B). (9) 

(This also follows from Proposition 2 in i4|.) We then have, 
using the chain rule and source polarization inequalities on the 
Bhattacharyya parameter, namely Proposition 1 in [4 |, that 

Ck+i[s] < Ck[sf if bk+1 = 1, 

a+i[^] <2a[5] if 6fe+i = 0, 

and using Theorem 3 of |5|, we conclude that for any a < 1/2 

lim inf P(G <2-2°'") >P((^ =0). 

£^00 

Finally, we conclude using ■ 
We then use Lemma [7] and |8] to conclude that 

-\{je[n] :i/(y,[5]|y^'-i)e{0,l,...,|^|}±e,V5C [to], 
n 

3Aj with rank(A^) = int(TO - iJ(Y, [r^'^^)), 
H{A,Yj\Y^-') < en}\ ^ I, (10) 

which implies Lemma |2l To conclude the proof of Theorem 
|2] part (2), let £„ = 2""° and E,-, = E„{sn) be the set 
defined through (fTOl i (which, in view of previous results, is 
equivalent to the definition given in Section III-Ab . We then 
have for j E En that the components Sj to be decoded in Yj 
are not correctly decoded with probability 

Pe{j)<H{AjYj\Y^-^)<en, 
and the block error probability is bounded as 

J6-B„ 

SO that taking a < 1/2 large enough, we can reach a block 
error probability of 0(2-" ) for any /? < 1/2. 

C. Proof of Lemma \3\ 

For i e Rey2n{p{k)), 

H{YMmY'-\p{k)))>l~~e 

where e = e"^ /2n and Y{p{k)) = X{p{k))Gn where X{p{k)) 
is i.i.d. under p{k). Moreover, for any distribution p on F2 
such that H{p) > H{p{k)) ^k/n, there exists a distribution 
V on F2 such that p{k) -kv^p, where * denotes the circular 
convolution. Equivalently, there exists Z ^ v independent of 
X(ji{k)) - p[k), such that X{p) = X{p{k)) © Z ~ p. Define 
Y{p) = GnX{p), Y{p{k)) = GnX{p{k)) and W = G„Z, 
hence Y{p) = Y{p{k)) © W. We have 

HiY{p),\Y{py-') > H{Y{p),\Y{py-\W) 

= H{Y{p{k)W{p{k)y-\w) 
= H{Y{p{k)y\Y{p{k)y-^) (11) 



where the last equaUty follows from the fact that Y{p) is 
independent of W since X{p) is independent of Z. Therefore, 
for any X{p) i.i.d. such that H{p) > k/n and for any 

j £ Ri{p{k)), we have 

HiY{p),\Yipy-')>l-e (12) 

and 

H{Y{p)[R,{p{km > E HiY,ip)\Y=-\p)]) 

>\RMk)m-e). 

Hence, defining by fiu. the distribution of Y {p)[Ri{p{k))] and 
Ufi the uniform distribution on R^{p{k)), we have 

D{iir\\Ur) < H{Ur) - H{p.R) 

< mpikw 

< ne. (13) 
Using Pinsker inequality and ( fT3] l. we obtain 

\\m-UB}\i<2\n2D{pR\\URy/^ <e. 
Finally, we have from Theorem [T] 

\Riip{k))\^k + o{n). 
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